Data-Driven Joint Debugging of the DBpedia Mappings and Ontology - Towards Addressing the Causes Instead of the Symptoms of Data Quality in DBpedia

نویسنده

  • Heiko Paulheim
چکیده

DBpedia is a large-scale, cross-domain knowledge graph extracted from Wikipedia. For the extraction, crowd-sourced mappings from Wikipedia infoboxes to the DBpedia ontology are utilized. In this process, different problems may arise: users may create wrong and/or inconsistent mappings, use the ontology in an unforeseen way, or change the ontology without considering all possible consequences. In this paper, we present a data-driven approach to discover problems in mappings as well as in the ontology and its usage in a joint, data-driven process. We show both quantitative and qualitative results about the problems identified, and derive proposals for altering mappings and refactoring the DBpedia ontology.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DBpedia Mappings Quality Assessment

The root of schema violations for rdf data generated from (semi-)structured data, often derives from mappings, which are repeatedly applied and specify how an rdf dataset is generated. The dbpedia dataset, which derives from Wikipedia infoboxes, is no exception. To mitigate the violations, we proposed in previous work to validate the mappings which generate the data, instead of validating the g...

متن کامل

Updating Wikipedia via DBpedia Mappings and SPARQL

DBpedia crystallized most of the concepts of the Semantic Web using simple mappings to convert Wikipedia articles (i.e., infoboxes and tables) to RDF data. This “semantic view” of wiki content has rapidly become the focal point of the Linked Open Data cloud, but its impact on the original Wikipedia source is limited. In particular, little attention has been paid to the benefits that the semanti...

متن کامل

Test-driven Assessment of [R2]RML Mappings to Improve Dataset Quality

rdf dataset quality assessment is currently performed primarily after data is published. Incorporating its results, by applying corresponding adjustments to the dataset, happens manually and occurs rarely. In the case of (semi-)structured data (e.g., csv, xml), the root of the violations often derives from the mappings that specify how the rdf dataset will be generated. Thus, we suggest shiftin...

متن کامل

Extending DBpedia with Wikipedia List Pages

Thanks to its wide coverage and general-purpose ontology, DBpedia is a prominent dataset in the Linked Open Data cloud. DBpedia’s content is harvested from Wikipedia’s infoboxes, based on manually created mappings. In this paper, we explore the use of a promising source of knowledge for extending DBpedia, i.e., Wikipedia’s list pages. We discuss how a combination of frequent pattern mining and ...

متن کامل

Assessing and Refining Mappings to RDF to Improve Dataset Quality

rdf dataset quality assessment is currently performed primarily after data is published. However, there is neither a systematic way to incorporate its results into the dataset nor the assessment into the publishing workflow. Adjustments are manually –but rarely– applied. Nevertheless, the root of the violations which often derive from the mappings that specify how the rdf dataset will be genera...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017